Chapter 19:
Poisson regression, where the outcome is the number of events that occur in an interval of time
Nonlinear least-squares regression, where the relationship between the predictors and numerical
outcome can be more complicated than a simple summation of terms in a linear model
LOWESS curve-fitting, where you fit a custom function to describe your data
Finally, Part 5 ends with Chapter 20, which provides guidance on the mechanics of regression
modeling, including how to develop a modeling plan, and how to choose variables to include in
models.
A Matter of Life and Death: Working with
Survival Data
Sooner or later, everyone dies, and in biological research, it becomes especially important to
characterize that sooner-or-later part as accurately as possible using survival analysis techniques. But
characterizing survival can get tricky. It’s possible to say that patients may live an average of 5.3 years
after they are diagnosed with a particular disease. But what is the exact survival experience? Imagine
you do a study with patients who have this disease. You may ask: Do all patients tend to live around
five or six years, or do half the patients die within the first few months, and the other half survive ten
years or more? And what if some patients live longer than the observational period of your study?
How do you include them in your analysis? And what about participants who stopped returning calls
from your study staff? You do not know if these dropouts went on to live or die. How do you include
their data in your analysis?
The need to study survival with data like these led to the development of survival analysis
techniques. But survival analysis is not only intended to study the outcome of death. You can use
survival analysis to study the time to the first occurrence of non-death events as well, like
remission or recurrence of cancer, the diagnosis of a particular condition, or the resolution of a
particular condition. Survival analysis techniques are presented in Part 6.
Getting to Know Statistical Distributions
Statistics books always contain tables, so why should this one be any different? Back in the not-so-
good old days, when analysts had to do statistical calculations by hand, they needed to use tables of the
common statistical distributions to complete the calculation of the significance test. They needed tables
for the normal distribution, Student t, chi-square, Fisher F, and others. Now, software does all this for
you, including calculating exact p values, so these printed tables aren’t necessary anymore.
But you should still be familiar with the common statistical distributions that may describe the
fluctuations in your data, or that may be referenced in the course of performing a statistical calculation.
Chapter 24 contains a list of commonly used distribution functions, with explanations of where you can